Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty swiftidentifier #61

Merged
merged 6 commits into from
Aug 20, 2017
Merged

Pretty swiftidentifier #61

merged 6 commits into from
Aug 20, 2017

Conversation

djbe
Copy link
Member

@djbe djbe commented Aug 17, 2017

Fixes #30 (refs SwiftGen/SwiftGen#289).

Templates PR: SwiftGen/templates#74

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

This still doesn't achieve what we want. I've just finished modifying our templates to use swiftIdentifier:"pretty" instead of swiftIdentifier|snakeToCamelCase, but we didn't account for these cases:

  • Cyan-Color --> Cyan_Color instead of CyanColor
  • .SFNSDisplay --> _SFNSDisplay instead of SFNSDisplay
  • .SFNSDisplay --> _SFNSDisplay instead of SFNSDisplay
  • Accept-CGU --> Accept_CGU instead of AcceptCGU
  • Show-NavCtrl --> Show_NavCtrl instead of ShowNavCtrl
  • apples.count --> Apples_Count instead of ApplesCount
  • multiLine\nKey --> MultiLine_Key instead of MultiLineKey

Replacing the current "pretty" implementation from:
replace:" "|snakeToCamelCase:"true"|swiftIdentifier
with:
swiftIdentifier|snakeToCamelCase:"true"|swiftIdentifier
solves all the issues mentioned above.

@AliSoftware I know you didn't want to run the filter twice, how would you suggest solving these issues?

@AliSoftware
Copy link
Contributor

Replace not only spaces but also dashes and periods with underscore before snakeToCamelCase? Or more generally using a RegEx and use the word separator character class to split the string first?

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

Right, replace - and .. But what about \n? (see multiline key) Should the user expect those to be replaced too? Or other characters?

@AliSoftware
Copy link
Contributor

We could also use string.enumerateSubstringsInRange(range, options: NSStringEnumerationOptions.ByWords)

@AliSoftware
Copy link
Contributor

Right, replace - and .. But what about \n? (see multiline key) Should the user expect those to be replaced too? Or other characters?

That's why I mentioned the more generic answer of using some official unicode word separator 😉

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

I have to admit I don't know what that matches (words, sure, but how?). Going to read some docs 😄

@AliSoftware
Copy link
Contributor

Or CFStringTokenizer and kCFStringTokenizerUnitWord.
Or NSLinguisticTagger

(whatever is compatible with Linux would be preferable)

@AliSoftware
Copy link
Contributor

Quick Playground test:

import Foundation

extension String {
  func words() -> [String] {

    let range = self.startIndex..<self.endIndex
    var words = [String]()

    self.enumerateSubstrings(in: range, options: .byWords) { (substring, _, _, _) -> () in
      words.append(substring!)
    }

    return words
  }

  func upFirst() -> String {
    guard let first = self.characters.first else { return self }
    return String(first).uppercased() + String(self.characters.dropFirst())
  }
}

extension Array where Element == String {
  func camelJoin() -> String {
    return self.map { $0.upFirst() }.joined(separator: "")
  }
}


print("Cyan-Color".words().camelJoin())
print(".SFNSDisplay".words().camelJoin())
print("Accept-CGU".words().camelJoin())
print("Show-NavCtrl".words().camelJoin())
print("apples.count".words().camelJoin())
print("apples. count".words().camelJoin())
print("multiLine\nKey".words().camelJoin())

Gives:

CyanColor
SFNSDisplay
AcceptCGU
ShowNavCtrl
Apples.count
MultiLineKey

We could argue in considering x.y as 2 words too to make that ApplesCount instead of having Apples.count (that will become Apples_count once swiftIdentifier'd) but the rest seems consistent

@AliSoftware
Copy link
Contributor

AliSoftware commented Aug 19, 2017

If I use that addition:

  func punctWords() -> [String] {
    return self.components(separatedBy: .punctuationCharacters).flatMap({ $0.words() })
  }

Then we got:

"apples.count".punctWords().camelJoin()
// ApplesCount
"a_b.c.d-e_f".punctWords().camelJoin()
// ABCDEF
"foo_bar.baz.qux-yay".punctWords().camelJoin()
// FooBarBazQuxYay

@AliSoftware
Copy link
Contributor

After all from a logical standpoint, that's exactly what we want: having a string that is CamelCased, that is upper case at… word boundaries. Be that boundaries be an underscore, a dash, a period, or similar. After all that's how we CamelCase strings in our head, right?

@AliSoftware
Copy link
Contributor

AliSoftware commented Aug 19, 2017

You know what? Even string.components(separatedBy: NSCharacterSet.alphanumerics.inverted).camelJoin() could be sufficient actually, it seems

func test(strings: [String], apply: (String) -> String) {
  for s in strings {
    let orig = s.replacingOccurrences(of: "\n", with: "\\n").padding(toLength: 20, withPad: " ", startingAt: 0)
    let res = apply(s)
    print("\(orig) --> \(res)")
  }
}

let inputs = ["Cyan-Color", ".SFNSDisplay", "Accept-CGU", "Show-NavCtrl", "apples.count", "apples. count", "multiLine\nKey","foo_bar.baz.qux-yay"]

test(strings: inputs) {
  $0.components(separatedBy: NSCharacterSet.alphanumerics.inverted).camelJoin()
}
/*
Cyan-Color           --> CyanColor
.SFNSDisplay         --> SFNSDisplay
Accept-CGU           --> AcceptCGU
Show-NavCtrl         --> ShowNavCtrl
apples.count         --> ApplesCount
apples. count        --> ApplesCount
multiLine\nKey       --> MultiLineKey
foo_bar.baz.qux-yay  --> FooBarBazQuxYay
*/

@AliSoftware
Copy link
Contributor

Ok, continuing the thought, maybe the logical thing to do would be to split using any character that is invalid as a swift identifier character indeed, instead of splitting using any character that is not an alphanumeric. So that we keep any emoji or whatnot untouched.

But then guess what… that would be equivalent of "splitting the string using non-swift-char boundaries, then capitalising the first char of each "word" found, then joining"… which would actually be equivalent of swiftIdentifier|camelCase|swiftIdentifier in the end… so maybe you were right all along and I just needed to convince myself… 😇

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

That then breaks things for strings such as HEADER_TITLE --> HEADERTITLE, and other issues with full uppercase keys.

This works though:

		string = string
			.components(separatedBy: NSCharacterSet.alphanumerics.inverted)
			.joined(separator: "_")
		string = try snakeToCamelCase(string, stripLeading: true)
		return StencilSwiftKit.swiftIdentifier(from: string, replaceWithUnderscores: true)

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

Oh, you beat me to it, indeed, emoji 🤷‍♂️

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

What we could do, is:

  • swiftidentifier
  • snakeToCamelCase:"true"
  • titlecase
  • prefix with _ if needed

So, move the last 2 items from swiftidentifier into a loose function that we can call separately.

@djbe
Copy link
Member Author

djbe commented Aug 19, 2017

Actually, ignore that titlecase after the snake filter, it isn't needed.

@AliSoftware
Copy link
Contributor

Ah, looks like a good idea! 👍

Transforms an arbitrary string into a valid Swift identifier (using only valid characters for a Swift identifier as defined in the Swift language reference). It will apply the following rules:
This filter has a couple of modes that you can specifiy using an optional argument (defaults to "normal"):

**normal**: Transforms an arbitrary string into a valid Swift identifier (using only valid characters for a Swift identifier as defined in the Swift language reference). It will apply the following rules:

- Uppercase the first character.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, really, does that filter actually uppercase the first character all the time in our code (even since Swift 3)? Shouldn't by anymore I think… (if it's still the case, might be worth using that PR to make the normal mode correct again and stop uppercasing while there's not reason to?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That... would be a breaking change TBH. And for what? Types will still need an uppercase first letter, so we'll still need to apply a filter.

That's what I meant to say in the discussion in #30, add enum cases for swift versions, and maybe even for the type of identifier (variable, case, enum, struct, etc...). Maybe not so extensive (not so many options), but could be possible.

Although upper first letter is a decent default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's debatable. The aim of swiftIdentifier is only to make the string valid Swift. Prettifying the string is nice too, but transforming too much means that if we don't want that transformation we have to apply the reverse of that transform afterwards to cancel it… And that reverse transform may not always exist.

Just feels odd to over-transform when half of the cases would need it, sure, but the other half doesn't and would need to revert it, so why force to do it in the first place instead of letting it be opt-in is what I mean. The history of this comes back from Swift 2 of course, but we're passed that now.

But indeed that would be a breaking change so maybe let's keep that for a later PR :-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, snakeToCamelCase would uppercase the first letter anyway, no way around that.

And I'm not talking about swift 2, but even swift 3/4 types need an uppercase first letter. Only variables and cases don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Right

README.md Outdated
@@ -36,7 +36,7 @@
* `removeNewlines`: Removes newlines and other whitespace characters, depending on the mode ("all" or "leading").
* `replace`: Replaces instances of a substring with a new string.
* `snakeToCamelCase`: Transforms text from snake_case to camelCase. By default it keeps leading underscores, unless a single optional argument is set to "true", "yes" or "1".
* `swiftIdentifier`: Transforms an arbitrary string into a valid Swift identifier (using only valid characters for a Swift identifier as defined in the Swift language reference)
* `swiftIdentifier`: Transforms an arbitrary string into a valid Swift identifier (using only valid characters for a Swift identifier as defined in the Swift language reference). In "pretty" mode, it will also first apply the snakeToCamelCase filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not apply the snakeToCamelCase filter first, but last, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we changed the order around in the last commit.

return prefixWithUnderscoreIfNeeded(string: result, forbiddenChars: exceptions)
}

func prefixWithUnderscoreIfNeeded(string: String,
Copy link
Contributor

@AliSoftware AliSoftware Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of that function name, because it doesn't tell when it's needed — it doesn't tell that it makes sense only in the context of a Swift identifier.
But I have to admit that I'm not inspired either for a better name, so… ¯\_(ツ)_/¯


let chars = string.unicodeScalars
let firstChar = chars[chars.startIndex]
let prefix = !head.longCharacterIsMember(firstChar.value) && tail.longCharacterIsMember(firstChar.value) ? "_" : ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I remember why we have that test logic.
Shouldn't testing against the valid characters for head be enough here to determine if we must prefix with _ or not? If the first character is not in the set of valid head chars, then what's the point of testing it against tail, we will prefix it with _ anyway…

And then if that tail is indeed not needed, we should probably split the identifierCharacterSets function returning a tuple into 2 functions returning a single set each I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That bit of code used to be in the swiftIdentifier(from:...) method, and it had to check if it was still a valid character, otherwise that character was going to be replaced by a _.

I think (not 100% sure) that the check for tail can be removed here.

@djbe djbe force-pushed the feature/pretty-swiftidentifier branch from 418a2bf to 6fb9e7a Compare August 20, 2017 00:23
@djbe djbe merged commit fe07cd4 into master Aug 20, 2017
@djbe djbe deleted the feature/pretty-swiftidentifier branch August 20, 2017 15:47
@djbe djbe added this to the 2.1.0 milestone Aug 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants